{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ae374483",
   "metadata": {},
   "source": [
    "# Cataloguing voice-memos for a self managed personal assistant"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc4fa500665eccb9",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "\n",
    "Explore the capabilities of SuperDuperDB by effortlessly integrating models across various data modalities, including audio and text. This project aims to develop sophisticated data-based applications with minimal code complexity.\n",
    "\n",
    "### Objectives:\n",
    "\n",
    "1. Manage a database of audio recordings.\n",
    "2. Index the content of these audio recordings.\n",
    "3. Perform searches and queries on the content of these audio recordings.\n",
    "\n",
    "### Our approach involves:\n",
    "\n",
    "* Using a transformers model from Facebook's AI team for audio-to-text transcription.\n",
    "* Applying an OpenAI vectorization model to index the transcribed text.\n",
    "* Combining the OpenAI ChatGPT model with relevant recordings to query the audio database.\n",
    "\n",
    "Real-life use cases encompass personal note-taking, voice diaries, meeting transcriptions, language learning, task reminders, podcast indexing, knowledge base creation, journalism interviews, storytelling archives, and music catalog searches. \n",
    "\n",
    "In this example, we'll organize and catalog voice memos for a self-managed personal assistant using SuperDuperDB."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ecf9f0ec45cb1f3",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "Before diving into the implementation, ensure that you have the necessary libraries installed by running the following commands:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dce1a857",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pip install superduperdb\n",
    "!pip install transformers soundfile torchaudio librosa openai\n",
    "!pip install -U datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d02e472-8395-435c-b46d-6a5158ef67fb",
   "metadata": {
    "tags": []
   },
   "source": [
    "Additionally, ensure that you have set your openai API key as an environment variable. You can uncomment the following code and add your API key:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "94262bf76c630b10",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "#os.environ['OPENAI_API_KEY'] = 'sk-...'\n",
    "if 'OPENAI_API_KEY' not in os.environ:\n",
    "    raise Exception('Environment variable \"OPENAI_API_KEY\" not set')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32971b8afdf76fe5",
   "metadata": {},
   "source": [
    "## Connect to datastore \n",
    "\n",
    "First, we need to establish a connection to a MongoDB datastore via SuperDuperDB. You can configure the `MongoDB_URI` based on your specific setup. \n",
    "Here are some examples of MongoDB URIs:\n",
    "\n",
    "* For testing (default connection): `mongomock://test`\n",
    "* Local MongoDB instance: `mongodb://localhost:27017`\n",
    "* MongoDB with authentication: `mongodb://superduper:superduper@mongodb:27017/documents`\n",
    "* MongoDB Atlas: `mongodb+srv://<username>:<password>@<atlas_cluster>/<database>`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b84da3f2ef58e401",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from superduperdb import superduper\n",
    "from superduperdb.backends.mongodb import Collection\n",
    "import os\n",
    "\n",
    "mongodb_uri = os.getenv(\"MONGODB_URI\",\"mongomock://test\")\n",
    "\n",
    "# Superdupers your database\n",
    "db = superduper(mongodb_uri)\n",
    "\n",
    "# Create a collection for Voice memos\n",
    "voice_collection = Collection('voice-memos')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d13d051e8f5f6f",
   "metadata": {},
   "source": [
    "## Load Dataset\n",
    "\n",
    "In this example, we use the `LibriSpeech` dataset as our voice recording dataset, containing around 1000 hours of read English speech. Similar functionality can be achieved with any audio source, including audio hosted on the web or in an `s3` bucket. For instance, repositories of audio from conference calls or memos can be indexed in the same way."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10ab7114",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "from superduperdb.ext.numpy import array\n",
    "from superduperdb import Document\n",
    "\n",
    "# Load the LibriSpeech ASR demo data from Hugging Face datasets\n",
    "data = load_dataset(\"hf-internal-testing/librispeech_asr_demo\", \"clean\", split=\"validation\")\n",
    "\n",
    "# Create an `Encoder` for audio data\n",
    "enc = array('float64', shape=(None,))\n",
    "\n",
    "# Add the encoder to the SuperDuperDB instance\n",
    "db.add(enc)\n",
    "\n",
    "# Insert audio data into the MongoDB collection 'voice_collection'\n",
    "db.execute(voice_collection.insert_many([\n",
    "    # Create a SuperDuperDB Document for each audio sample\n",
    "    Document({'audio': enc(r['audio']['array'])}) for r in data\n",
    "]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "721f31f4626881e0",
   "metadata": {},
   "source": [
    "## Install Pre-Trained Model (LibriSpeech) with Database\n",
    "\n",
    "Apply a pre-trained `transformers` model to the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "222284f7",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from transformers import Speech2TextProcessor, Speech2TextForConditionalGeneration\n",
    "from superduperdb.ext.transformers import Pipeline\n",
    "\n",
    "# Load the pre-trained Speech2Text model and processor from Facebook's library\n",
    "model = Speech2TextForConditionalGeneration.from_pretrained(\"facebook/s2t-small-librispeech-asr\")\n",
    "processor = Speech2TextProcessor.from_pretrained(\"facebook/s2t-small-librispeech-asr\")\n",
    "\n",
    "# Define the sampling rate for the audio data\n",
    "SAMPLING_RATE = 16000\n",
    "\n",
    "# Create a SuperDuperDB pipeline for speech-to-text transcription\n",
    "transcriber = Pipeline(\n",
    "    identifier='transcription',\n",
    "    object=model,  # The pre-trained Speech2Text model\n",
    "    preprocess=processor,  # The processor for handling input audio data\n",
    "    preprocess_kwargs={'sampling_rate': SAMPLING_RATE, 'return_tensors': 'pt', 'padding': True},  # Preprocessing configurations\n",
    "    postprocess=lambda x: processor.batch_decode(x, skip_special_tokens=True),  # Postprocessing to convert model output to text\n",
    "    predict_method='generate',  # Specify the prediction method\n",
    "    preprocess_type='other',  # Specify the type of preprocessing\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed83a8b084844292",
   "metadata": {},
   "source": [
    "# Run Predictions on All Recordings in the Collection\n",
    "Apply the `Pipeline` to all audio recordings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "573dccc4",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "transcriber.predict(\n",
    "    X='audio',  # Specify the input feature name as 'audio'\n",
    "    db=db,  # Provide the SuperDuperDB instance\n",
    "    select=voice_collection.find(),  # Specify the collection of audio data to transcribe\n",
    "    max_chunk_size=10  # Set the maximum chunk size for processing audio data\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "64a6cbd8e3d429d9",
   "metadata": {},
   "source": [
    "## Ask Questions to Your Voice Assistant\n",
    "\n",
    "Interact with your voice assistant by asking questions, leveraging the capabilities of MongoDB for vector-search and filtering rules:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3aedc03c",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from superduperdb import VectorIndex, Listener\n",
    "from superduperdb.ext.openai import OpenAIEmbedding\n",
    "\n",
    "# Create a VectorIndex with OpenAI embedding for audio transcriptions\n",
    "db.add(\n",
    "    VectorIndex(\n",
    "        identifier='my-index',  # Set a unique identifier for the VectorIndex\n",
    "        indexing_listener=Listener(\n",
    "            model=OpenAIEmbedding(model='text-embedding-ada-002'),  # Use OpenAIEmbedding for audio transcriptions\n",
    "            key='_outputs.audio.transcription.0',  # Specify the key for indexing the transcriptions in the output\n",
    "            select=voice_collection.find(),  # Select the collection of audio data to index\n",
    "        ),\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f92b56f",
   "metadata": {},
   "source": [
    "Let's verify the functionality by searching for the term \"royal cavern.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7d2e3e56",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Define the search parameters\n",
    "search_term = 'royal cavern'  # Set the search term for audio transcriptions\n",
    "num_results = 2  # Set the number of desired search results\n",
    "\n",
    "# Execute a search query using the VectorIndex 'my-index'\n",
    "# Search for audio transcriptions similar to the specified search term\n",
    "# and retrieve the specified number of results\n",
    "search_results = list(\n",
    "    db.execute(\n",
    "        voice_collection.like(\n",
    "            {'_outputs.audio.transcription.0': search_term},\n",
    "            n=num_results,\n",
    "            vector_index='my-index',  # Use the 'my-index' VectorIndex for similarity search\n",
    "        ).find({}, {'_outputs.audio.transcription': 1})  # Retrieve only the 'transcription' field in the results\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6068514b31268846",
   "metadata": {},
   "source": [
    "## Enrich it with Chat-Completion\n",
    "\n",
    "Connect the previous steps with gpt-3.5.turbo, a chat-completion model on OpenAI. The goal is to enhance completions by seeding them with the most relevant audio recordings, determined by their textual transcriptions. Retrieve these transcriptions using the previously configured `VectorIndex`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "99e206af",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Import the OpenAIChatCompletion module from superduperdb.ext.openai\n",
    "from superduperdb.ext.openai import OpenAIChatCompletion\n",
    "\n",
    "# Create an instance of OpenAIChatCompletion with the GPT-3.5-turbo model\n",
    "chat = OpenAIChatCompletion(\n",
    "    model='gpt-3.5-turbo',\n",
    "    prompt=(\n",
    "        'Use the following facts to answer this question\\n'\n",
    "        '{context}\\n\\n'\n",
    "        'Here\\'s the question:\\n'\n",
    "    ),\n",
    ")\n",
    "\n",
    "# Add the OpenAIChatCompletion instance to the database\n",
    "db.add(chat)\n",
    "\n",
    "# Display the details of the added model in the database\n",
    "print(db.show('model'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cb623c4",
   "metadata": {},
   "source": [
    "## Full Voice-Assistant Experience\n",
    "\n",
    "Evaluate the complete model by asking a question related to a specific fact mentioned in the audio recordings. The model will retrieve the most relevant recordings and utilize them to formulate its answer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "60d7f0af-6305-4c8c-be65-4b75ec7dbf50",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from superduperdb import Document\n",
    "\n",
    "# Define a question to ask the chat completion model\n",
    "question = 'Is anything really Greek?'\n",
    "\n",
    "# Use the db.predict method to get a response from the GPT-3.5-turbo model\n",
    "response = db.predict(\n",
    "    model_name='gpt-3.5-turbo',\n",
    "    \n",
    "    # Input the question to the chat completion model\n",
    "    input=question,\n",
    "    \n",
    "    # Select relevant context for the model from the SuperDuperDB collection of audio transcriptions\n",
    "    context_select=voice_collection.like(\n",
    "        Document({'_outputs.audio.transcription.0': question}), vector_index='my-index'\n",
    "    ).find(),\n",
    "    \n",
    "    # Specify the key in the context used by the model\n",
    "    context_key='_outputs.audio.transcription.0',\n",
    ")[0].content\n",
    "\n",
    "# Print the response obtained from the chat completion model\n",
    "print(response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
